Annotating Errors in Student Texts: First Experiences and Experiments

نویسندگان

  • Sara Stymne
  • Eva Pettersson
  • Beáta Megyesi
  • Anne Palmér
چکیده

We describe the creation of an annotation layer for word-based writing errors for a corpus of student writings. The texts are written in Swedish by students between 9 and 19 years old. Our main purpose is to identify errors regarding spelling, split compounds and merged words. In addition, we also identify simple word-based grammatical errors, including morphological errors and extra words. In this paper we describe the corpus and the annotation process, including detailed descriptions of the error types and guidelines. We find that we can perform this annotation with a substantial inter-annotator agreement, but that there are still some remaining issues with the annotation. We also report results on two pilot experiments regarding spelling correction and the consistency of downstream NLP tools, to exemplify the usefulness of the annotated corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explanation of Residents' Experiences Concerning Medication Errors in Neonatal Intensive Care Units: A Qualitative Study

Introduction: Medication errors are a potentially hazardous accident for the patients and can be used as a measure of patient safety in the healthcare system. Neonates are the most vulnerable population because of their body size. The experiences and views of those involved in the healthcare system can be a significant source of information gathering and planning in preventing medication errors...

متن کامل

Investigation of different types of nursing errors based on their lived and working experiences in health centers; A qualitative study

Introduction: The occurrence of human error is inevitable, and the health area and the nurses are no exception.Considering the fact that nursing service error is a harmful phenomenon and in some cases irrecoverable, therefore, identification the types of nursing errors in order to reduce them and improve patient safety is vital. Methods: This research was performed qualitatively and through a d...

متن کامل

Corpus building for Mongolian language

This paper presents an ongoing research aimed to build the first corpus, 5 million words, for Mongolian language by focusing on annotating and tagging corpus texts according to TEI XML (McQueen, 2004) format. Also, a tool, MCBuilder, which provides support for flexibly and manually annotating and manipulating the corpus texts with XML structure, is presented.

متن کامل

Detecting Code-Switching in a Multilingual Alpine Heritage Corpus

This paper describes experiments in detecting and annotating code-switching in a large multilingual diachronic corpus of Swiss Alpine texts. The texts are in English, French, German, Italian, Romansh and Swiss German. Because of the multilingual authors (mountaineers, scientists) and the assumed multilingual readers, the texts contain numerous code-switching elements. When building and annotati...

متن کامل

Annotating Article Errors in Spanish Learner Texts: Design and Evaluation of an Annotation Scheme

Annotating a corpus with error information is a challenging task. This paper describes the design, evaluation and refinement of an annotation scheme for Spanish article errors in learner data, so that future work on corpus annotation and automatic article error detection can progress. To evaluate reliability, 300 noun phrases with definite, indefinite and zero article have been tagged by four a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017